Skip to content

MDEV-39061 mariadb-backup compatible wrappers for BACKUP SERVER#5140

Open
Thirunarayanan wants to merge 25 commits into
MDEV-14992from
MDEV-39061
Open

MDEV-39061 mariadb-backup compatible wrappers for BACKUP SERVER#5140
Thirunarayanan wants to merge 25 commits into
MDEV-14992from
MDEV-39061

Conversation

@Thirunarayanan

Copy link
Copy Markdown
Member

scripts/mariabackup/mariabackup.sh: a drop-in wrapper that
lets existing mariabackup --backup invocations drive the server-side
BACKUP SERVER command without changing user scripts.

mariabackup.sh translates
--backup into "BACKUP SERVER TO '<dir>'" via the mariadb client,
forwarding connection options, and layers
--stream/--compress as tar/gzip pipelines on the result. It
validates the target directory up front and rejects the

- mbstream.sh shims the mbstream CLI onto tar, dropping
mbstream-only flags (-p/--parallel) so legacy pipelines keep working.
- README.md maps every supported --backup option to its BACKUP SERVER
equivalent and documents current limitations.

Add include/have_mariabackup_wrapper.inc redirects
$XTRABACKUP to the wrapper so a test opts in by sourcing one
file; skips when the wrapper, bash, or the mariadb client
is unavailable.

wrapper_basic.test : Exercises full backup, streaming, compression,
the ignored legacy options

@Thirunarayanan Thirunarayanan requested a review from dr-m May 28, 2026 09:52
@CLAassistant

CLAassistant commented May 28, 2026

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
0 out of 2 committers have signed the CLA.

❌ dr-m
❌ Thirunarayanan
You have signed the CLA already but the status is still pending? Let us recheck it.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces server-side backup support (BACKUP SERVER) for MariaDB, implementing backup and log archiving mechanisms for both the InnoDB and Aria storage engines. It also includes a compatibility wrapper script (mariabackup.sh) to map legacy mariabackup CLI commands to the new server-side SQL interface. The code review identified several critical issues, including a potential server crash in backup_innodb.cc due to invalid format arguments in error reporting, null pointer dereferences and assertion failures in both engines when backup steps are executed out of order or fail to initialize, and security vulnerabilities (command injection) and argument parsing bugs in the shell wrapper script.

{
fail:
fail= 1;
my_error(ER_ERROR_ON_RENAME, MYF(ME_ERROR_LOG), s, d, errno);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Passing s and d (which are int file descriptors on Unix) to my_error with the ER_ERROR_ON_RENAME format specifier will cause a segmentation fault/crash because ER_ERROR_ON_RENAME expects string arguments (const char*). They should be replaced with the actual file names.

Suggested change
my_error(ER_ERROR_ON_RENAME, MYF(ME_ERROR_LOG), s, d, errno);
my_error(ER_ERROR_ON_RENAME, MYF(ME_ERROR_LOG), "ib_logfile101", dst.c_str(), errno);

Comment on lines +347 to +352
int aria_backup_end(THD *thd, bool abort) noexcept
{
int ret_val= aria_backup->end(thd, abort);
aria_backup.reset();
return ret_val;
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

If aria_backup_start was never called (e.g., because another plugin's backup_start failed earlier in plugin_foreach_with_mask), aria_backup will be nullptr. When plugin_foreach_with_mask subsequently calls backup_end for all plugins, aria_backup_end will dereference aria_backup without a null check, causing a server crash. A null check should be added.

int aria_backup_end(THD *thd, bool abort) noexcept
{
  if (!aria_backup)
    return 0;
  int ret_val= aria_backup->end(thd, abort);
  aria_backup.reset();
  return ret_val;
}

Comment on lines +195 to +198
int end(THD *thd, bool abort) noexcept
{
int fail= 0;
log_sys.latch.wr_lock();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

If backup_start was never called or failed, max_first_lsn is 0. In this case, calling InnoDB_backup::end will trigger assertions like ut_ad(log_sys.archive) which can fail and crash the debug server. We should return early if max_first_lsn is 0.

  int end(THD *thd, bool abort) noexcept
  {
    int fail= 0;
    log_sys.latch.wr_lock();
    if (!max_first_lsn)
    {
      log_sys.latch.wr_unlock();
      return 0;
    }

Comment on lines +276 to +283
if (!max_first_lsn)
{
ut_ad(queue.empty());
if (old_size)
delete_logs();
logs.clear();
}
log_sys.latch.wr_unlock();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

If backup_start was never called or failed, max_first_lsn is 0. In this case, calling InnoDB_backup::fini will check the transaction state and trigger the assertion ut_ad("invalid state" == 0) because the transaction state is not TRX_STATE_BACKUP. We should return early if max_first_lsn is 0.

Suggested change
if (!max_first_lsn)
{
ut_ad(queue.empty());
if (old_size)
delete_logs();
logs.clear();
}
log_sys.latch.wr_unlock();
if (!max_first_lsn)
{
ut_ad(queue.empty());
if (old_size)
delete_logs();
logs.clear();
log_sys.latch.wr_unlock();
return 0;
}
log_sys.latch.wr_unlock();

Comment on lines +315 to +328
if [[ -n "$FINAL_INCLUDE" ]]; then
echo "Setting backup_include='$FINAL_INCLUDE'" >&2
mariadb $MARIADB_OPTS -e "SET GLOBAL backup_include='$FINAL_INCLUDE'"
fi

if [[ -n "$FINAL_EXCLUDE" ]]; then
echo "Setting backup_exclude='$FINAL_EXCLUDE'" >&2
mariadb $MARIADB_OPTS -e "SET GLOBAL backup_exclude='$FINAL_EXCLUDE'"
fi

# Execute BACKUP SERVER
SQL="BACKUP SERVER TO '$TARGET_DIR'"
echo "Executing: $SQL" >&2
mariadb $MARIADB_OPTS -e "$SQL"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The parameters FINAL_INCLUDE, FINAL_EXCLUDE, and TARGET_DIR are interpolated into shell commands. To effectively prevent command injection, we should validate these inputs using a strict whitelist of allowed characters (such as alphanumeric characters and a limited set of safe symbols like '-', '_', '.') instead of relying on a blacklist or simple escaping.

Suggested change
if [[ -n "$FINAL_INCLUDE" ]]; then
echo "Setting backup_include='$FINAL_INCLUDE'" >&2
mariadb $MARIADB_OPTS -e "SET GLOBAL backup_include='$FINAL_INCLUDE'"
fi
if [[ -n "$FINAL_EXCLUDE" ]]; then
echo "Setting backup_exclude='$FINAL_EXCLUDE'" >&2
mariadb $MARIADB_OPTS -e "SET GLOBAL backup_exclude='$FINAL_EXCLUDE'"
fi
# Execute BACKUP SERVER
SQL="BACKUP SERVER TO '$TARGET_DIR'"
echo "Executing: $SQL" >&2
mariadb $MARIADB_OPTS -e "$SQL"
if [[ -n "$FINAL_INCLUDE" && ! "$FINAL_INCLUDE" =~ ^[a-zA-Z0-9_.-]+$ ]]; then
echo "Error: Invalid characters in FINAL_INCLUDE" >&2
exit 1
fi
References
  1. When validating input that will be interpolated into shell commands, use a strict whitelist of allowed characters (e.g., alphanumeric and a limited set of safe symbols like '-', '_', '.') instead of a blacklist to effectively prevent command injection.

TABLES_PATTERN=""
TABLES_EXCLUDE_PATTERN=""
TABLES_FILE=""
MARIADB_OPTS=""

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Handling MARIADB_OPTS as a flat string will break if any option contains spaces or special characters (e.g., --password="my secret pass"). We should use a bash array instead.

Suggested change
MARIADB_OPTS=""
MARIADB_OPTS=()

Comment thread scripts/mariabackup/mariabackup.sh Outdated
Comment on lines +74 to +100
--user=*|--password=*|--host=*|--port=*|--socket=*)
# Connection options (long form) - pass through to mariadb client
MARIADB_OPTS="$MARIADB_OPTS $1"
shift
;;
-u|-p|-h|-P|-S)
# Short form. If the next argv looks like a value (not another
# flag), consume it as well — supports both `-u USER` and bare
# `-p` (which mariadb client treats as a password prompt).
if [[ -n "${2-}" && "$2" != -* ]]; then
MARIADB_OPTS="$MARIADB_OPTS $1 $2"
shift 2
else
MARIADB_OPTS="$MARIADB_OPTS $1"
shift
fi
;;
-u*|-p*|-h*|-P*|-S*)
# Bundled short form: -uUSER, -pSECRET, -hHOST, ...
MARIADB_OPTS="$MARIADB_OPTS $1"
shift
;;
--defaults-file=*|--defaults-extra-file=*)
# Config file options - pass through to mariadb client
MARIADB_OPTS="$MARIADB_OPTS $1"
shift
;;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

We should append connection options to the MARIADB_OPTS array instead of concatenating them into a flat string to correctly handle arguments with spaces or special characters.

        --user=*|--password=*|--host=*|--port=*|--socket=*)
            # Connection options (long form) - pass through to mariadb client
            MARIADB_OPTS+=("$1")
            shift
            ;;
        -u|-p|-h|-P|-S)
            # Short form. If the next argv looks like a value (not another
            # flag), consume it as well — supports both `-u USER` and bare
            # `-p` (which mariadb client treats as a password prompt).
            if [[ -n "${2-}" && "$2" != -* ]]; then
                MARIADB_OPTS+=("$1" "$2")
                shift 2
            else
                MARIADB_OPTS+=("$1")
                shift
            fi
            ;;
        -u*|-p*|-h*|-P*|-S*)
            # Bundled short form: -uUSER, -pSECRET, -hHOST, ...
            MARIADB_OPTS+=("$1")
            shift
            ;;
        --defaults-file=*|--defaults-extra-file=*)
            # Config file options - pass through to mariadb client
            MARIADB_OPTS+=("$1")
            shift
            ;;

Comment thread sql/sql_backup.cc Outdated
}

#ifndef _WIN32
int dir= open(target.str, O_DIRECTORY);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To be fully standard and portable, open with O_DIRECTORY should explicitly include the access mode O_RDONLY.

Suggested change
int dir= open(target.str, O_DIRECTORY);
int dir= open(target.str, O_RDONLY | O_DIRECTORY);

explicit Aria_backup(THD *thd, Target target) noexcept
: target(target)
#ifndef _WIN32
, datadir_fd(open(maria_data_root, O_DIRECTORY))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To be fully standard and portable, open with O_DIRECTORY should explicitly include the access mode O_RDONLY.

    , datadir_fd(open(maria_data_root, O_RDONLY | O_DIRECTORY))

@grooverdan

Copy link
Copy Markdown
Member

Hey @Thirunarayanan , have you thought of how this is going to end up in packaging? It is replacing the original mariadb-backup?

It needs some INSTALL/INSTALL_SCRIPT cmake directives around this to give it a install location and a cmake component. debian installation/packaging would need the relevant debian/{package}.install to include te script.

dr-m added 2 commits May 29, 2026 17:11
This introduces a basic driver Sql_cmd_backup, storage engine interfaces,
and basic copying of InnoDB data files.
On Windows, we pass a target directory name; elsewhere, we pass a
target directory handle.

backup_target: A structured data type to represent a directory or a
stream. On Microsoft Windows, we must use directory paths because
there is no variant of CopyFileEx() that would work on file handles.

copy_entire_file(): A file copying service for POSIX systems.

copy_file(): A sparse file-copying service for POSIX systems.

backup_context: An InnoDB backup context, attached to trx->lock.backup
so that context can exist between InnoDB_backup::end(), which is
releasing all locks, and InnoDB_backup::fini() in the same thread,
which is expected to finalize the backup without modifying files
in the server data directory.

fil_space_t::write_or_backup: Keep track of in-flight page writes and
pending backup operation. We must not allow them concurrently, because
that could lead into torn pages in the backup.

fil_space_t::backup_end: The first page number that is not being backed up
(by default 0, to indicate that no backup is in progress).

TRX_STATE_BACKUP: A special InnoDB transaction state indicating association
with BACKUP SERVER, which allows us to pass some context in trx_t from
innodb_backup_end() to innodb_backup_finalize().

log_t::backup: Whether BACKUP SERVER is in progress. The purpose of this
is to make BACKUP SERVER prevent the concurrent execution of
SET GLOBAL innodb_log_archive=OFF or SET GLOBAL innodb_log_file_size
when innodb_log_archive=OFF.

log_sys.archived_checkpoint: Keep track of the earliest available
checkpoint, corresponding to log_sys.archived_lsn. This reflects
SET GLOBAL innodb_log_recovery_start (which is settable now), for
incremental backup.

buf_flush_list_space(): Check for concurrent backup before writing each
page. This is inefficient, but this function may be invoked from multiple
threads concurrently, and it cannot be changed easily, especially for
fil_crypt_thread().
dr-m and others added 2 commits June 5, 2026 06:57
scripts/mariabackup/mariabackup.sh: a drop-in wrapper that lets
existing mariabackup invocations drive the server-side BACKUP
SERVER command without changing user scripts.

mariabackup.sh covers all four mariabackup modes. --backup
translates into "BACKUP SERVER TO '<dir>'" via the mariadb client,
forwarding connection options, and layers --stream/--compress as
tar/gzip pipelines on the result. --prepare runs mariadbd
--bootstrap on backup.cnf so InnoDB applies the archived redo log.
--copy-back / --move-back drop a prepared backup into the datadir
via cp -r / mv.

--prepare --incremental-dir copies the incremental's ib_logfile*
into the base and advances innodb_log_recovery_target;
innodb_log_recovery_start stays pinned to the base checkpoint.

--apply-log-only maps to --innodb-force-recovery=3 to skip
rollback between incrementals.

--rollback-xa runs two passes: normal recovery, then a second
bootstrap with --tc-heuristic-recover=ROLLBACK.

--copy-back / --move-back refuse a non-empty datadir unless
--force-non-empty-directories is set, and print the post-action
chown / systemctl start commands.

For incremental --backup, innodb_log_archive_start is treated as a
startup-only, read-only server invariant: the wrapper reads
@@global.innodb_log_archive_start and fails fast if the archive
floor exceeds the base backup's end LSN.

Limitations:
 --export is accepted but not yet implemented; the wrapper prints
a warning and runs plain recovery without producing the per-table
.cfg files needed for ALTER TABLE ... IMPORT TABLESPACE.

mbstream.sh shims the mbstream CLI onto tar, dropping
mbstream-only flags (-p/--parallel) so legacy pipelines keep
working.

README.md maps every supported option per mode to its BACKUP
SERVER equivalent and documents the backup.cnf format.

Add include/have_mariabackup_wrapper.inc redirects $XTRABACKUP to
the wrapper so a test opts in by sourcing one file; skips when the
wrapper, bash, or the mariadb client is unavailable.

wrapper_basic.test: exercises full backup, streaming, compression,
the ignored legacy options.

@dr-m dr-m left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rebase this on the current branch.

@@ -0,0 +1,82 @@
--source include/have_mariabackup_wrapper.inc

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be implemented in a way that is compatible with the .combinations logic?

I would like most of --suite=mariabackup to be run both with the genuine mariadb-backup and with the BACKUP SERVER based wrapper scripts. Currently, this is the only test that exercises the wrappers, and it fails to test the original executables to demonstrate compatibility between them and the wrappers.

Comment on lines +30 to +40
--echo #
--echo # --stream=mbstream emits a valid tar archive to stdout
--echo #
--let $targetdir=$MYSQLTEST_VARDIR/tmp/bk_stream
--let $streamfile=$MYSQLTEST_VARDIR/tmp/bk_stream.tar
--exec $XTRABACKUP $defaults --backup --target-dir=$targetdir --stream=mbstream > $streamfile 2>$logfile
--exec tar -tf $streamfile > /dev/null
--let SEARCH_PATTERN=Creating tar stream
--source include/search_pattern_in_file.inc
--rmdir $targetdir
--remove_file $streamfile

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m not sure if this should be in the current scope. We don’t have any MDEV-38362 streaming backup yet. What should be more in the scope is demonstrating that the mbstream wrapper script is syntactically compatible with its namesake utility.

@@ -0,0 +1,19 @@
#!/bin/bash

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we write this for the POSIX shell? Bourne Again Shell (bash) is not available in all environments, for good reasons: reduced resource usage as well as attack surface; remember https://en.wikipedia.org/wiki/Shellshock_(software_bug)

Comment on lines +7 to +13
-p|--parallel)
SKIP_NEXT=1
;;
-p*)
;;
--parallel=*)
;;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are necessary but not sufficient rules. mbstream -xp4 would be a valid invocation of the original utility (equivalent to -x -p4), but not of GNU tar.

I did test that mbstream indeed treats anything after -p as a single argument. For example, -p4t and -p4x would report the following, respectively:

Warning: option 'parallel': signed value 4398046511104 adjusted to 2147483647
Unknown suffix 'x' used for variable 'parallel' (value '4x'). Legal suffix characters are: K, M, G, T, P, E
mbstream: Error while setting value '4x' to 'parallel'

Please add a test file that covers both the real mbstream and this wrapper. Both with some invalid and valid invocation. For example, mbstream -t is not allowed, while tar -t is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

4 participants